Classification Trees With Unbiased Multiway Splits
نویسندگان
چکیده
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.
منابع مشابه
Selecting Multiway Splits in Decision Trees
Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf. There are efficient algorithms for finding the optimal multiway split for a numeric attribute, given the number of intervals in which it is to be divided. The problem we tackle is how to choose this n...
متن کاملUsing minimum bootstrap support for splits to construct confidence regions for trees
Many of the estimated topologies in phylogenetic studies are presented with the bootstrap support for each of the splits in the topology indicated. If phylogenetic estimation is unbiased, high bootstrap support for a split suggests that there is a good deal of certainty that the split actually is present in the tree and low bootstrap support suggests that one or more of the taxa on one side of ...
متن کاملSplit Selection Methods for Classification Trees
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected...
متن کاملBuilding Consensus with Balanced Splits
Given the multitude of sources for reconstructing the evolutionary history between entities, phylogenetic reconstruction methods often provide several trees classifying these entities. A key step in reconciling the different trees is to construct a consensus view of the history. If we consider each tree as a collection of laminar splits (two-way partitions), the primary problem in arriving at a...
متن کاملMultiway Iceberg Cubing on Trees
The Star-cubing algorithm performs multiway aggregation on trees but incurs huge memory consumption. We propose a new algorithm MG-cubing that achieves maximal multiway aggregation. Our experiments show that MG-cubing achieves similar and very often better time and memory efficiency than Star-cubing.
متن کامل